Search CORE

40 research outputs found

Articulation-aware Canonical Surface Mapping

Author: Fouhey David F.
Gupta Abhinav
Kulkarni Nilesh
Tulsiani Shubham
Publication venue
Publication date: 26/05/2020
Field of study

We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on keypoint supervision for learning, we present an approach that can learn without such annotations. Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via enforcing consistency among the predictions. We present results across a diverse set of animal object categories, showing that our method can learn articulation and CSM prediction from image collections using only foreground mask labels for training. We empirically show that allowing articulation helps learn more accurate CSM prediction, and that enforcing the consistency with predicted CSM is similarly critical for learning meaningful articulation.Comment: To appear at CVPR 2020, project page https://nileshkulkarni.github.io/acsm

arXiv.org e-Print Archive

Crossref

Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

Author: Efros Alexei A.
Fouhey David
Gupta Saurabh
Malik Jitendra
Tulsiani Shubham
Publication venue
Publication date: 24/04/2018
Field of study

The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments evaluate a number of practical design questions, demonstrate that we can infer this representation, and quantitatively and qualitatively demonstrate its merits compared to alternate representations.Comment: Project url with code: https://shubhtuls.github.io/factored3

arXiv.org e-Print Archive

Crossref

Cross-task weakly supervised learning from instructional videos

Author: Alayrac Jean-Baptiste
Cinbis Ramazan Gokberk
Fouhey David
Laptev Ivan
Sivic Josef
Zhukov Dimitri
Publication venue
Publication date: 01/01/2019
Field of study

In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: `pour egg' should be trained jointly with other tasks involving `pour' and `egg'. We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset, CrossTask, aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality.Comment: 18 pages, 17 figures, to be published in proceedings of the CVPR, 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

OpenMETU (Middle East Technical University)